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transformed with nucleotide sequences encoding a specialized substrate. When these nucleotide sequences are expressed, the exogenous 
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METHODS, NUCLEOTIDE SEQUENCES AND HOST CELLS FOR 
ASSAYING EXOGENOUS AND ENDOGENOUS PROTEASE ACTIVITY 

TECHNICAL FIELD OF INVENTION 
5 The invention relates to methods for assaying 

exogenous protease activity in a host cell transformed 
with nucleotide sequences encoding that protease and a 
specialized substrate. It also relates to methods for 
assaying endogenous protease activity in a host cell 
10 transformed with nucleotide sequences encoding a 

specialized substrate. When these nucleotide sequences 
are expressed, the exogenous or endogenous protease 
cleaves the substrate and releases a polypeptide that is 
secreted out of the cell, where it can be easily 
15 quantitated using standard assays. The methods and 

transformed host cells of this invention are particularly 
useful for identifying inhibitors of the exogenous and 
endogenous proteases. If the protease is a protease from 
an infectious agent or is characteristic of a diseased 
20 state, inhibitors identified by these methods are 
potential pharmaceutical agents for treatment or 
prevention of the disease. 

BACKGROUND ART 

Proteases play an important role in the 
25 regulation of many biological processes. They also play 
a major role in disease.. In particular, proteolysis of 
primary polypeptide precursors is essential to the 
replication of several infectious viruses, including HIV 
and HCV. These viruses encode proteins that are 
30 initially synthesized as large polyprotein precursors 
Those precursors are ultimately processed by the viral 
protease to mature viral proteins. In light of this, 
researchers have begun to concentrate on inhibition of 
viral proteases as a potential treatment for certain 
35 viral diseases. 

Proteases also play a role in non-infectious 
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diseases. For example, changes in normal cellular 
function may cause an undesirable increase or decrease in 
proteolytic activity. This often leads to a disease 
state. 

5 The ability to detect viral or mutant protease 

activity in a quick and simple assay is important in the 
biochemical characterization of these proteases and in 
the screening and identification of potential inhibitors. 
Several of these assays have been described in the art. 

T. M. Block et al., Antimicrob. Agents 
Chemother., 34, pp. 2337-41 (1990) described a prototype 
assay for screening potential HIV protease inhibitors. 
This assay involved cloning the HIV protease recognition 
sequence into the tetracycline resistance gene (Tet R ) of 
PBR322 and cotransfroming E. coli with the modified Tet" 
gene and the gene encoding the HIV protease. Co- 
expression of these two genes caused tetracycline 
sensitivity. Potential inhibitors were identified by the 
ability to restore tetracycline resistance to the 
20 transformed bacteria. 

E. Sarubbi et al., FEBS Lett. . 279, pp. 265-69 
(1991) described another assay for detecting HIV protease 
inhibitors that utilized a HIV-1 Gag-fi-galactosidase 
fusion protein and a monoclonal antibody that bound to 
25 the fusion protein in the gag region. Coexpression of 
the HIV protease and the fusion protein lead to cleavage 
of the latter and abolished monoclonal antibody binding. 
Potential inhibitors were identified by increased binding 
of the monoclonal antibody to the fusion protein. 
30 T - A - Smith et al., Proc. Natl. Acad. Sci . USA . 

88, pp. 5159-62 (1991), B. Dasmahapatra et al., Proc. 
Natl. Acad. Sci. USA. 89, pp. 4159-62 (1992) and M. G. 
Murray et al., Gene , 134, pp. 123-28 (1993) each 
described protease assay systems utilizing the yeast GAL 4 
35 protein. Each of these authors described inserting a 

protease cleavage site in between the DNA binding domain 
and the transcriptional activating domain of GAL4. 
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Cleavage of that site by a coexpressed protease renders 
GAL4 transcriptionally inactive leading to the inability 
of the transformed yeast to metabolize galactose. 

H.-D. Liebig et al., Proc. Natl, Acad. Sci. 
5 USA, 88, pp. 5979-83 (1991) disclosed the use of a fusion 
protein consisting of a self-cleaving protease fused to 
the a fragment of fi-galactosidase to assay protease 
activity. Active forms of the protease cleaved 
themselves off of the fusion protein and the resulting 
10 protein was able to carry out o-complementation. Fusions 
containing inactive protease were unable to perform a- 
complementation. 

Y. Komoda et al., J. Virol. , 68, pp. 7351-57 
(1994) described an assay to identify HCV protease 
15 cleavage sites within the HCV precursor polyprotein. 
These authors created chimeric proteins comprising 
various portions of the HCV precursor polyprotein 
inserted in between the E. coli maltose binding protein 
and dihydrofolate reductase. If the HCV portion of <-hese 
chimeras contained a cleavage site, the chimera would be 
cleaved when it was coexpressed with HCV protease in E. 
coli. Cleavage of the chimera was determined by SDS- 
polyacrylamide gel electrophoresis of £. coli lysates. 

Y* Hirowatari et al., Anal. Biochem. . 225, pp. 
25 113-120 (1995) described another assay to detect HCV 
protease activity. In this assay, the substrate, HCV 
protease and a reporter gene are cotransfected into COS 
cells. The substrate is a fusion protein consisting of 
(HCV NS2)-(DHFR)-(HCV NS3 cleavage site) -Taxi. The 
30 reporter gene is chloramphenicol transferase (CAT) under 
control of the HTLV-1 long terminal repeat (LTR) and 
resides in the cell nucleus following expression. The 
uncleaved substrate is expressed as a membrane-bound 
protein on the surface of the endoplasmic reticulum due 
35 to the HCV NS2 portion. Upon cleavage, the released Taxi 
protein translocates to the nucleus and activates CAT 
expression by binding to the HTLV-1 LTR. Protease 
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activity is determined by measuring CAT activity in a 
cell lysate. 

Despite these developments, no one has yet 
developed a protease assay system that can be carried out 
5 with higher eukaryotic cells and is both quantitative and 
does not require cell lysis prior to quantitation. 
Avoiding cell lysis prior to quantitation is desirable in 
that the assay may be performed more rapidly and with 
less manipulation. Also, lysis can often lead to 
10 aberrant results. Thus, there is a need for. an accurate 
and quantitative cellular-based protease assay that can 
be carried out in a higher eukaryotic cell without cell 
lysis. 

SUMMARY OF THE TNVENTTON 

15 The present invention fulfills this need by 

providing methods for assaying exogenous protease 
activity in a host cell expressing that protease. The 
methods involve utilizing a host cell expressing a first 
nucleotide sequence encoding an exogenous protease and a 

20 second nucleotide sequence encoding an artificial 

substrate for that protease. The artificial substrate 
comprises a cleavage site for the protease situated at or 
near the natural maturation site of a pre-polypeptide, 
part of which is secreted following proteolytic 

25 processing. When the host is grown under conditions that 
cause expression of the first and second nucleotide 
sequences, the exogenous protease cuts the artificial 
substrate at the cleavage site, releasing the mature 
polypeptide which is secreted into the growth media. The 

30 growth media is then isolated and assayed for the mature 
polypeptide. 

Alternatively, the invention may be utilized to 
assay endogenous proteases, especially when quantitation 
of those proteases is difficult due to the inability to 
35 detect or distinguish between the cleaved and uncleaved 
native substrate. 
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According to one aspect of the invention, the 
assay is used to quantitate an exogenous viral protease. 
Such assays are particularly useful as replacements for 
current viral protease assays that require the use of 
5 intact, infectious virus or where no simple viral model 
is available to detect viral protease activity. These 
assays may be used to identify and assay potential 
inhibitors of viral proteases which, in turn, may be used 
as pharmaceutical agents for the treatment or prevention 
10 of viral disease. 

This invention also provides host cells 
transformed with nucleotide sequences encoding an 
endogenous protease and a corresponding substrate, as 
well as those transformed with a specialized substrate 
15 for an endogenous protease. These hosts may be used in 
the methods of this invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 
figure 1 depicts the structure of pcDL-SRo296. 
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Figure 2 depicts the structure of a derivative 
of pKV containing the pre-IL-lB coding sequence. 

Figure 3, panel A, is an immunoblot of cell 
lysates from cells transfected with a NS3-wild-type or 
NS3-mutant NS3-4A-4B-ILlfi or cotransf ected with a NS3- 
mutant NS3-4A-4B-IL1A and a NS3 (1-180) construct probed 
25 with an anti-NS3 antibody. Figure 3, panel B, is an 

immunoblot of the same cell lysates probed with an anti- 
IL-lfi antibody. . 

Figure 4 depicts the immunoprecipitation of the 
media from i5 S-labelled cells transfected with either a 
30 NS3-wild-type or NS3-mutant NS3-4A-4B-IL1A construct with 
an anti-IL-lfl antibody. 

Figure 5 is an immunoblot of cell lysates from 
cells co-transfected with NS3-4A and either a NS5A/5B- or 
CSM-containing pre-IL13 substrate probed with an anti-IL- 
35 15 antibody. 

Figure 6 depicts the immunoprecipitation of the 
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media from 3S S-iabelled cells co-transf ected with NS3-4A 
and either a NS5A/5B- or CSM-containing pre-ILIB 
substrate with an anti-IL-lfl antibody. 

Figure 7 depicts the inhibition of HCV NS3 
5 protease cleavage of pre-IL-lfl* by varying concentrations 
of VH16075 and VH15924. 



DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides a method for 
assaying exogenous protease activity in a host cell 
10 comprising the steps of: 

(a) incubating a host cell transformed 
with a first nucleotide sequence encoding an exogenous 
protease and a second nucleotide sequence encoding an 
artificial polypeptide substrate under conditions which 
15 cause said exogenous protease and said artificial 
substrate to be expressed; 

wherein said substrate comprises: 

(i) a cleavage site for said 
exogenous protease; and 

20 . (ii> a polypeptide that is secreted 

out of said cell following cleavage by said 
exogenous protease; 

(b) separating said host cell from its 
growth media under non-lytic conditions; and 
25 (c) assaying said growth media for the 

presence of said secreted polypeptide. 

As used herein, the term "exogenous protease" 
means a protease not normally expressed by the host cell 
used in the assay. That term includes full-length 
30 proteases that are identical to those found in nature, a 
well as catalytically active fragments thereof. 

The choice of exogenous protease to be assayed 
is solely dependent upon the decision of the user. The 
only requirements are that: (1) the specificity of the 
35 enzyme in terms of what amino acid residues or sequences 
it cleaves at be known; (2) the primary structure of at 
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least the catalytically active portion of the enzyme be 
known; and (3) a nucleotide sequence encoding at least an 
enzymatically active portion of the protease exists or 
can be made and can be expressed in a heterologous host 
5 cell. 

According to a preferred embodiment, the 
exogenous protease is a protease encoded by a pathogenic 
agent. More preferred is a protease encoded by a 
pathogenic virus. Most preferably, the exogenous 
10 protease is the NS3 protease of hepatitis C virus 
("HCV"). 

HCV NS3 protease is a 70 kilodalton protein 
that is involved in the maturation of viral polypeptides 
following infection. It is a serine protease which has a 

15 Cys-X or Thr-X substrate specificity. It has also been 
shown that the protease activity of NS3 resides 
exclusively in the N-terminal 180 amino acids of the 
enzyme. Therefore, nucleotide sequences encoding 
anywhere from the first 180 amino acids of NS3 up to the 

20 full length enzyme may be utilized in the methods of this 
invention. Active fragments of other known proteases may 
also be used as an alternative to the full-length 
protease. 

According to an alternative embodiment, the 
25 invention provides a method for assaying endogenous 

protease activity in a host cell comprising the steps of: 

a) incubating a host cell transformed with a 
nucleotide sequence encoding an artificial polypeptide 
substrate under conditions which cause said artificial 

30 substrate to be expressed; 

wherein said substrate comprises: 

i) a cleavage site for said endogenous 
protease; and 

ii) a polypeptide that is secreted out of 
35 said cell following cleavage by said endogenous 

protease; 

b) separating said host cell from its growth 
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media under non-lytic conditions; and 

c) assaying said growth media for the 
presence of said secreted polypeptide. 

The term "endogenous protease", as used 
5 throughout this application, refers to a proteases that 
is normally expressed by the host cell. It includes both 
wild type proteases, as well as naturally occurring 
mutant proteases with increased or decreased activity. 

According to the invention, tht artificial 
polypeptide substrate used in the methods must comprise a 
cleavage site for the protease to be assayed; and must be 
secreted out of the cell following cleavage by that 
protease. Preferably, the DNA encoding the artificial 
substrate is derived from a gene or cDNA encoding a 
15 naturally occurring polypeptide that is normally cleaved 
and then secreted out of a cell, but not necessarily 
cleaved by the ceil utilized in the assay. 

The DNA encoding that polypeptide is then 
modified by inserting, in frame with the polypeptide 
coding sequence, nucleotides encoding a cleavage site 
that is recognized by the exogenous protease to be 
tested. If the cell utilized in the assay is capable of 
cleaving the substrate at its native cleavage site, then 
the nucleotides encoding the polypeptide's native 
25 cleavage site must be altered so as to render it 
uncleavable by endogenous proteases. 

The protease cleavage site in the artificial 
substrate is preferably inserted within .60 amino acids on 
either side of the native cleavage site. Preferably, the 
artificial cleavage site is inserted N-terminal to the 
native cleavage site. Alternatively, the protease 
cleavage site can be created by mutating the native 
polypeptide sequence. Such mutation is preferably 
performed on a sequence within 60 amino acids, more 
35 preferably N-terminal to the native cleavage site and 
within 8-10 amino acids of the native cleavage site; or 
is a mutation of the native cleavage site itself. 
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Alteration of the native cleavage site to 
render it uncleavable by the host cell may be achieved, 
if necessary, by insertion, deletion or mutation of 
nucleotides at that site. 
5 Insertion of the protease cleavage site into 

the substrate and alteration of its native cleavage site 
may be accomplished by any combination of a number of 
recombinant DNA techniques well known in the art, such as 
site directed mutagenesis or standard restriction 

10 digest/ligation cloning techniques. Alternatively, the 
DNA encoding all or part of the artificial substrate may 
be produced synthetically using a commercially available 
automated oligonucleotide synthesizer. Regardless of the 
techniques used to insert the protease cleavage site into 

15 the substrate polypeptide or alter its native cleavage 
site, it is crucial that the reading frame of the 
substrate polypeptide remain intact, without the 
insertion of stop codons. 

The choice of secretable polypeptide from which 

20 the artificial substrate is derived may be selected from 
any pre-polypeptide that can be cleaved by and the 
resulting mature polypeptide secreted out of the host 
cell used for the assay, but is not normally present in 
that cell. For use in eukaryotic cells there are two 

25 main categories of pre-polypeptide from which the choice 
can be made* 

The first and preferred category comprises pre- 
polypeptides that are expressed and cleaved in the 
cytoplasmic compartment. Among these proteins are 

30 interleukin-lfi (IL-lfi), interleukin-la (IL-la), basic 

fibroblast growth factor (bFGF) and endothelial-monocyte 
activating polypeptide II (EMAP-II) . The advantage of 
using cytoplasmic pre-polypeptides is that there is a 
much greater likelihood that the protease and the 

35 artificial substrate will share the same subcellular 

compartment. This is because most proteases of interest 
are also cytoplasmic proteins and thus will have access 
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to the artificial substrate. 

The second category of pre-polypeptides that 
may be used to create artificial substrates used in the 
methods of this invention are those that are expressed on 
5 the cell surface through the organellar secretory pathway 
and are retained on the cell surface. Such substrates 
are useful to assay endogenous and exogenous cell 
membrane proteases, as well as exogenous proteases that 
are similarly engineered to be cell membrane proteins. 
10 The technique of creating a cell membrane protease or 

substrate involves cloning a leader peptide (i.e., signal 
sequence) onto the N-terminus of the substrate or 
protease and a hydrophobic, membrane anchor sequence 
(either a transmembrane domain or a glycosylphophatidyl- 
15 inositol anchor sequence) onto the C-terminus. The 

resulting substrate is a cell membrane protein with an 
extracellularly located cleavage site. When cleaved by a' 
cell membrane protease on the same or a neighboring cell, 
the secreted polypeptide portion of the substrate is 
20 released into the media. 

Examples of sequences that may be used for 
anchoring these proteins in the membrane are the 
transmembrane domains of TNFa precursor [Nedopsasov et 
a1 -' Cold Spring Harb. S ymp. Quant. Biol. . 51, pp. 611-24 
25 (1986)], SP-C precursor [Keller et al., Biochem J. . 277, 
pp. 493-99 (1991)], or alkaline phosphatase [Berger et 
al./ Proc. Natl. Acad. Sci. USA. 86, pp. 1457-60 (1989)]. 

Techniques for cloning a signal sequence onto a 
cytoplasmic protein have been well documented [see, for 
example, Kizer and Trosha, BBRC , 174, pp. 586-92 (1991); 
Jost et al., J. Biol. Chem. . 269, pp. 26267-72 (1994) 
(expression and secretion of functional single chain Fv 
molecules using immunoglobulin light chain leader 
sequence); and Sasada et al., Cell Structure Function . 
35 13, pp. 129-41 (1988) (secretion of human EGF and IgE in 
mammalian cells using an IL-2 leader sequence) ] , as have 
techniques for cloning a transmembrane anchor sequences 
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onto cytoplasmic proteins [Berger et al., supra ; Oda et 
al., Biochem J., 301, pp. 577-83 (1984)]. By combining" 
these two techniques, the protease or substrate of 
interest can be converted from a cytoplasmic protein into 
5 a cell surface membrane protein. 

In order to insure that the substrate and 
protease will have access to one another and according to 
an alternate embodiment of the invention, the artificial 
substrate and an exogenous protease to be assayed may be 
10 encoded as part of a single polyprotein. That 

polyprotein may be a cytoplasmic or a membrane protein, 
as long as the substrate and protease domains reside in 
the same cellular compartment. 

The choice of host cell to use in this method 
15 is virtually unlimited. Any cell that can grow in 

culture, be transformed or transfected with heterologous 
nucleotide sequences and can express those sequence may 
be employed in this method. These include bacteria, such 
as E. coli, Bacillus, yeast and other fungi, plant c»lls, 
20 insect cells, mammalian cells. In addition, expression ' 
of either of those sequences in higher eukaryotic host 
cells may be transient or stable. Preferably, the host 
cell is a higher eukaryotic cell that is incapable of 
cleaving the substrate at its native cleavage site. 
25 Preferably, the host cell is a mammalian cell. Most 
preferably, the host cell is a COS cell. 

It will be .apparent that the specific choice of 
cell is governed by the particular protease to be assayed 
and by the particular artificial substrate used. In 
30 embodiments that assay an exogenous protease, one obvious 
limitation is that the endogenous cellular enzymes of the 
chosen host must be unable to cleave the artificial 
substrate to any significant extent. The endogenous rate 
of artificial substrate cleavage may be determined by 
35 transforming the selected host cell with only the 

nucleotide sequence coding for the artificial substrate 
and then growing that host under conditions which cause 
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expression of that nucleotide sequence and which would 
cause expression of the exogenous protease-encoding 
nucleotide sequence if that sequence were present. The 
growth media of the cell is then assayed for the presence 
5 of the secreted polypeptide portion of the substrate. In 
assays that measure exogenous protease activity, control 
cells (no exogenous protease expressed) should secrete 
less than 10% of the total amount of expressed substrate 
(due to endogenous cleavage and, in assays that do not 

10 distinguish between cleaved and uncleaved substrates, 

leeching of uncleaved substrate out of the cell) in order 
to be useful in the methods of this invention. When an 
endogenous protease is assayed, a controls for non- 
specific substrate cleavage is a cell transformed with a 

15 substrate that contain a mutation at the cleavage site. 
This mutation renders the substrate uncleavable by the 
specific endogenous protease being assayed, but still 
susceptible to non-specific cleavage. As with assays for 
exogenous proteases, control cells should secrete less 

20 than 10% of the total amount of expressed substrate. 

In order to quantitate the protease activity, 
the amount of secreted substrate polypeptide is measured. 
Quantitation may be achieved by subjecting the growth 
media to any of the various standard assay procedures 

25 that are well known in the art. These include, but are 
not limited to, iramunoblotting, ELISA, 
immunoprecipitation, RIA, other color imetric assays, 
enzymatic assay or bioassay. Quantitation techniques 
that employ antibodies, preferably utilize antibodies 

30 that have low cross-reactivity with the uncleaved 

substrate. Preferably cross-reactivity is less than 20% 
and more preferably less than 5%. 

According to another embodiment, the present 
invention provides a method of screening for protease 

35 inhibitors. In this method, the above-described assay is 
carried out in the presence and absence of potential 
inhibitors of the protease. When the assays of this 
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invention are performed using cells which transiently 
express the substrate and protease, the inhibitor is 
preferably added immediately after transfection with the 
protease and substrate-encoding DNA sequences. When 
5 stable transformants are used, the potential inhibitor is 
added at the beginning of the assay. The efficacy of the 
potential inhibitor (and its ability to cross the cell 
membrane) is determined by comparing the amount of 
secreted substrate polypeptide present in the media of 

10 cells assayed in its presence versus its absence. 

Compounds which cause at least a 90% reduction in the 
amount of secreted substrate polypeptide are potentially 
useful protease inhibitors. 

In order that the invention described herein 

15 may be more fully understood, the following examples are 
set forth. It should be understood that these examples 
are for illustrative purposes only and are not to be 
construed as limiting this invention in any manner. 
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EXAMPLE 1 
Construction Of Expression Plasmids 
A. HCV NS3 Protease 



We cloned the nucleotide sequence coding for 
the entire, intact HCV NS3 protease, an NS3-4A 
polyprotein or a truncated NS3 consisting of amino acids 

25 1 to 180 into the mammalian expression plasmid pcDL-SRa 
[Y. Takebe et al., Mol. Cell. Biol. . 8, pp. 466-72 
(1988) J. That plasmid contains an SV40 origin of 
replication and an HTLV LTR enhancer/promoter sequence 
which ultimately drives the high level expression of the 

30 NS3 coding sequences (Figure 1) . 

The respective NS-3 coding fragments (full 
length NS3, NS3-4A polyprotein or truncated NS3 (amino 
acids 1-181) were obtained by PCR of the corresponding 
portions of a full length HCV H strain cDNA (SEQ ID 

35 NO:l). For each of the three coding fragments the 
following 5* primer was used (SEQ ID NO:2): 
5 ' GGACTAGTCTGCAGTCTAGAGCTCCATGGCGCCCATCACGGCGTACG3 ■ . The 
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f ragment-specif ic 3' primers used were: 
NS3 - (SEQ ID NO: 3) : 

3 • GAAGATCTGAATTCTAGATTTTACGTGACGACCTCCACGTCGGC5 ' ; 
NS3-4A - (SEQ ID NO: 4) : 
5 3 ■ GAAGATCTGAATTCTAGATTTTAGCACTCTTCCATCTCATCGAA5 ' ; and 
NS3 (1-181) - (SEQ ID NO:5): 

3 ■ GAAGATCTGAATTCTAGATTTTAGGATCTCATGGTTGTCTCTAGG5 » . These 
primers produced PCR-amplif ied fragments containing 
multiple restriction sites at either end for ease of 
10 cloning. 

In order to ligate the fragments to the vector, 
the vector was first cleaved with PstI and EcoRI to 
remove a small fragment. The cut vector was then 
purified and ligated to the respective Pstl/EcoRI cut NS3 

15 protease-encoding fragment. 
B. IL-1A/NS3 Substrate 

A derivative of plasmid pKV containing the pre- 
IL-lfl coding sequence has been described by P. K. Wilson 
et al., NatUfft/ 370, pp. 253-70 (1994). That plasmid 

20 contains the SV40 origin of replication and the early 

promoter. The pre-IL-lfl sequence was cloned between the 
Spel and Bglll sites shown in Figure 2. 

We inserted a double stranded synthetic DNA 
fragment (SEQ ID NO: 6) which encoded 20 amino acids: SEQ 

25 ID NO: 7: GADTEDWCCSMSYTWTGVH and contained linkers at 
both ends that included an ApaLl restriction site. The 
DNA was cloned into the ApaLl site in pre-IL-lfl (between 
the codons for amino acids His n5 and Asp n6 ) , immediately 
upstream of the native cleavage site (located between 

30 Asp U6 and Ala 117 ) . The first 18 amino acids of the insert 
correspond to the HCV peptide 5A/5B cleavage site. The 
last two amino acids are encoded by the linker. The 
inserted DNA maintained the reading frame of the native 
pre-IL-lfl protein. The resulting substrate is referred 

35 to throughout the application as "pre-IL-lfl*". 

NS3 cleaves the inserted peptide in between the 
cysteine and serine residues. Because the COS cells we 
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utilized in this assay were incapable of cleaving pre-IL- 
lfl (data not shown) , we did not have to knock out the 
native pre-IL-lB cleavage site. 

In another construct, we performed site 
5 directed mutagenesis to alter the native pre-IL-lB 
cleavage site of Asp MS -Ala u ,-Pro u , to Cys-Ser-Met, a 
conserved recognition sequence for NS3. This construct 
is referred to throughout the application as "pre-IL- 
lfiB(CSM) ". 
10 C. NS3-4A-A4B-IL-1Q 

In order to create a single fusion polypeptide 
that encoded both the exogenous protease and the 
polypeptide substrate, we utilized the fact that NS3 can 
autoprocess (cleave) an NS3-4A-4B polyprotein at both the 
15 NS3-4a and 4A-4B junctions. 

We isolated a DNA fragment that encoded NS3-4A 
and the first 60 amino acids of 4B through PCR using the ' 
HCV strain H cDNA referred to above (SEQ ID NO:l) and the 
following primers: SEQ ID NO: 8: 

5 ' GGACTAGTCTGCAGTCTAGAGCTCCATGGCGCCCATCACGGCGTACG3 • and 
SEQ ID NO: 9: 3 ' GGACGCGGTCTGCAGGAGGCCGAGGGC5 ' . The PCR 
products were digested with PstI and Xbal prior to 
cloning. 

The mature IL-1B portion of the construct 
(amino acids 117-269 of SEQ ID NO: 11) was created by PCR 
cloning of full length pre-IL-lfl cDNA (SEQ ID NO: 10) 
using the following primers: 

SEQ ID NO: 12: 5 'CTCGGCCTCCTGCAGGCACCTGTACGATCACTGAAC3 ' ; 
and SEQ ID NO: 13: 3 ' GGGAATTCTAGATTTf AGGAAGACACAAATTG5 ' . 
These PCR products were digested with PstI and EcoRI 
prior to cloning. 

The NS3-4A-A4B and IL-1J3 fragments were then 
ligated together with Xbal/EcoRI digested pcDL-SRa to 
obtain the desired construct. 

As a control we created a mutant NS3 protease 
fusion protein construct. This construct was identical 
to the one described above, except that the NS3 portion 
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was created by PCR using the same primers and the cDNA of 
the NS3 active site mutant S1165A [A. Grakoui et al., J.- 
Virol., 67 ' PP- 2832-43 (1993)]. The NS3 active site ~ 
mutant contains a serine-to-alanine mutation in its 
5 active site, rendering the enzyme inactive. 

EXAMPLE 2 

Transfect ion Of COS Cells And Assay Of Secrete'! IL-lfl 
The expression plasmid constructs described in 
Example 1 were transfected into COS-7 cells using the 
DEAE-Dextran transfection protocol [Gu et al., Neuron , 5, 
pp. 147-57 (1990)]. COS cells in 6-well clusters or 10o' 
mm dishes at 50% confluency were transfected with 4-10 ug 
of the desired plasmid in a DEAE-Dextran solution. 
Following transfection, the cells were incubated an 
15 additional 48 hours before assaying. 

The processing of pre-IL-lfi or NS3-4A-A4B-IL-1B 
fusion protein and subsequent secretion of mature IL-lfl 
into the media was measured by EL ISA of IL-lfl using an 
antibody that was specific for mature IL-lfl (approx. 3% 
cross-reactivity with pre-IL-lfi) . We analyzed expression 
by harvesting the COS cells in ice-cold phosphate 
buffered saline, lysing the cells in a 0.1% Triton X-100 
buffer and centrifuging the lysate to remove cell debris. 
The lysates were then analyzed by SDS-PAGE and 
immunoblotting using an IL-lfl antibody (Genzyme) and an 
NS3 antibody. Alternatively, expression, processing and 
secretion was analyzed by labelling the cells for 24 
hours in the presence of [»S] -methionine, incubating the 
cells for an additional 24 hours after the label was 
removed and then utilizing immunoprecipitation and SDS- 
PAGE to analyze the polypeptides. 

EXAMPLE 3 

NS3-Specific Processing Of An NS3-4A-A4B-IL-1B Fusion 
Protein An d Secretion Of A4B-IL-1S Into The Media 

Transfectants expressing the NS3-4A-A4B-IL-1U 
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fusion protein autoprocessed that protein at both the 
NS3-4A and 4A-4B junctions. The cell lysates of these 
transfectants were subjected to Western blotting 
utilizing an anti-NS3 antibody. Figure 3, panel A, Wt-1 
5 and Wt-2 lanes, shows that this experiment produced a 

doublet band in the 70 kD area, present only as a single 
band in the untransf orroed control cells (panel A, No DNA 
lane) . The second band of the doublet in the Wt-1 and 
Wt-2 lanes corresponds to the size of mature NS3. A 

10 transfectant that expressed an inactive mutant NS3- 

containing NS3-4A-A4B-IL-1J3 fusion protein demonstrated 
no 70 kDa doublet and therefore was not autoprocessed 
{NS3 mutant lane) . A transfectant that co-expressed the 
same mutant fusion protein together with a truncated, but 

15 active NS3 — NS3 (1-180) — was also analyzed. 

Surprisingly, the mutant fusion protein did not appear to 
be cleaved by NS3 (1-180) , as indicated by the lack of a 
doublet in the 70 kDa region (NS3 mutant + NS 3 (1-180) 
lane) . However, a 20 kDa band representing the truncated 

20 NS3 was detected in that lysate, as indicated by the 
NS3 (1-180) arrow. 

A similar experiment performed on cell lysates 
utilizing an mature IL-lfi-specif ic antibody demonstrated 
the presence of a band corresponding in size to the A4B- 

25 IL-1B portion of the fusion protein in both the NS3-4A- 

A4B-IL-1B transfectants (Figure 3, panel B, Wt-1 and Wt-2 
lanes) and, to a lesser degree in the NS3 mutant fusion 
protein/NS3 (1-180) cotransf ectant . Virtually no IL-lfi 
was detected in the NS3 mutant fusion protein expressing 

30 transfectant (IL-lfi arrow) . These experiments confirm 
that the cleavage observed in the wild type NS3-4A-A4B- 
IL-lii transfectants was dependent upon NS3 protease 
activity. Thus, we had proof that cleavage of this 
fusion protein was essentially NS3-dependent and not 

35 caused by some endogenous protease. 

Secretion of the cleaved substrate was 
determined by assaying culture media with a commercially 
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available mature IL-lA-specific ELISA assay (R&D Systems, 
Minneapolis, MN) . For the wild-type NS3-containing 
construct we detected a concentration of 2.5 ug/ml of IL- 
IA in the medium. We detected less than 0.25 ug/ml of 
5 IL-lfl in the media of cells transfected with the mutant 
NS3-containing construct. Immunoprecipitation experiment 
utilizing the same anti-IL-lfi antibody demonstrated the 
presence of A4B-IL-1B in the media of cells containing 
the wild type NS3-containing construct, but none from the 
10 mutant NS3-containing construct (Figure 4), thus 
confirming these results. 

EXAMPLE 4 

NS3-Specific Processing Of Mutated Pre-IL-lfl 
1C Containing An Artificial Cleavage Site And 

13 Secretion Of IL-lfl Into The Media 

We confirmed that NS3 protease can cleave 
artificial substrates other than an HCV polypeptide by 
cotransfecting COS cells with the NS3-4A and either of 
the pre-IL-ia-containing artificial substrate expression 
constructs described in Example 1C. 

Co-expression of the NS3-4A and pre-IL-lfi* 
substrate sequences resulted in rapid cleavage of the 
substrate and concomitant secretion of a 19 Kd IL-lfl into 
the media. Secretion was quantitated using an ELISA 
25 specific for the processed form of IL-lfl. An immunoblot 
of cell lysates from these trans formants demonstrated the 
presence of both cleaved and uncleaved substrate (Figure 
5, NS3-4A + IL-lfl* lane) . The same experiment was 
performed using cells that were metabolically labelled 
30 with [ 3! S] -methionine, followed by immunoprecipitation of 
the media with the processed IL-lfl-specif ic antibody. 
The results of the immunoprecipitation experiment are 
shown in Figure 6, NS3-4A + pre-IL-lfl* lanes. 

When we coexpressed NS3-4A and the pre-IL- 
35 lfl(CSM) sequences, we also observed cleavage of the 

substrate at the predicted Cys lI( -Ser„, site. Both cleaved 
and uncleaved forms were observed in cell lysates using 
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immunoblotting specific for IL-lfl (Figure 5, NS3-4A + IL- 
1B(CSM) lane). Immunoprecipitation of the media from 
[ 35 S] -methionine labelled cells also demonstrated the 
presence IL-lA-containing cleavage product, but less than 
5 that observed for the 5A-5B-containing pre-IL-lfl 

substrate (Figure 6, NS3-4A + pre-IL-lfl (CSM) lane). 

EXAMPLE 5 
Assay of NS3 Inhibitors 

We tested the potential of compounds VH-15924 
10 and VH-16075 as HCV NS3 protease inhibitors in our 
assays. 

Transfectants expressing the NS3-4A-A4B-IL-1S 
were grown in the presence of varying amounts VH-15924. 
Even at concentrations as high as 100 pM, we detected the 
15 presence of the cleavage product, A4B-IL-1B, in the 

media. This indicated that VH-15924 was not an effective 
inhibitor of NS3 protease. 

We also assayed the inhibition of cleavage and 
secretion of pre-IL-lfi* substrate by both VH-15924 and 
20 VH-16075. VH-16075 inhibited cleavage and secretion with 
an IC 50 of 4 pM. As in the previous experiment, VH-15924 
did not completely inhibit cleavage/secretion even at 
concentrations of 100 pM (Figure 7) . 

While I have hereinbefore presented a number of 
25 embodiments of this invention, it is apparent that my 
basic construction can be altered to provide other 
embodiments which utilize the methods of this invention. 
Therefore, it will be appreciated that the scope of this 
invention is to be defined by the claims appended hereto 
30 rather than the specific embodiments which have been 
presented hereinbefore by way of example. 
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(i) APPLICANT: Su, Michael 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(ix) FEATURE: 

(A) NAME/KEY: matj>eptide 

(B) LOCATION: 3420.. 5312 

(D) OTHER INFORMATION: /products "NS3 protease" 

(ix) FEATURE: 

(A) NAME/KEY: matjpeptide 

(B) LOCATION: 5313.. 5474 



SUBSTITUTE SHEET (RULE 26) 




PCT/US96/06070 



(D) OTHER INFORMATION:. 


/products 


"NS4A" 






(ix) FEATURE: 

(A) NAME/KEY: mat_peptide 

(B) LOCATION: 5475.. 5552 

(D) OTHER INFORMATION: /product^ 


"truncated 


NS4B" 




(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 








GCCAGCCCCC TGATGGGGGC GACACTCCAC 


CAT AGAT CAC 


TCCCCTGTGA 


GGAACTACTG 


60 


TCTTCACGCA GAAAGCGTCT AGCCATGGCG 


TTAGTATGAG 


TGTCGTGCAG 


CCTCCAGGAC 


120 


CCCCCCTCCC GGGAGAGCCA TAGTGGTCTG 


CGGAACCGGT 


GAGTACACCG 


GAATT GCCAG 


180 


GACGACCGGG TCCTTTCTTG GATAAACCCG 


CTCAATGCCT 


GGAGATTTGG 


GCGTGCCCCC 


240 


GCAAGACTGC TAGCCGAGTA GTGTTGGGTC 


GCGAAAGGCC 


TTGTGGTACT 


GCCTGATAGG 


300 


GTGCTT GCGA GTGCCCCGGG AGGTCTCGTA 


GACCGTGCAC 


CATGAGCACG AATCCTAAAC 


360 


CTCAAAGAAA AACCAAACGT AACACCAACC 


GTCGCCCACA 


GGACGTCGAG 


TTCCCGGGTG 


420 


GCGGTCAGAT CGTTGGTGGA GTTTACTTGT 


TGCCGCGCAG 


GGGCCCTAGA 


TTGGGTGTGC 


480 


GCGCGACGAG GAAGACTTCC GAGCGGTCGC 


AACCTCGTGG 


TAGAC GTCAG 


CCTATCCCCA 


540 


AGGCACGTCG GCCCGAGGGC AGGACCTGGG 


CTCAGCCCGG 


GTACCCTTGG 


CCCCTCTATG 


600 


GCAATGAGGG TTGCGGGTGG GCGGGATGGC 


TCCTGTCTCC 


CCGTGGCTCT 


CGGCCTAGCT 


660 


GGGGCCCCAC AGACCCCCGG CGTAGGTCGC 


GCAATTTGGG 


TAAGGTCATC 


GATACCCTTA 


720 


CGTGCGGCTT CGCCGACCTC ATGGGGTACA TACCGCTCGT 


CGGCGCCCCT 


CTTGGAGGCG 


780 


CTGCCAGGGC CCTGGCGCAT GGCGTCCGGG 


TTCTGGAAGA 


CGGCGTGAAC 


TATGCAACAG 


840 


GGAACCTTCC TGGTTGCTCT TTCTCTATCT 


TCCTTCTGGC 


CCTGCTCTCT 


TGCCTGACTG 


900 


TGCCCGCTTC AGCCTACCAA GTGCGCAATT 


CCTCGGGGCT 


TTACCATGTC 


ACCAATGATT 


960 


GCCCTAATTC GAGTATTGTG TACGAGGCGG 


CCGATGCCAT 


CCTGCACACT 


CCGGGGTGTG 


1020 


TCCCTTGCGT TCGCGAGGGT AACGCCTCGA 


GGTGTTGGGT 


GGCGGTGACC 


CCCACGGTGG 


1080 


CCACCAGGGA CGGCAAACTC CCCACAACGC 


AGCTTCGACG 


TCATATCGAT 


CTGCTTGTCG 


1140 


GGAGCGCCAC CCTCTGCTCA GCCCTCTACG 


TGGGGGACCT 


GTGCGGGTCT 


GTTTTTCTTG 


1200 


TTGGTCAACT GTTTACCTTC TCTCCCAGGC 


GCCACTGGAC 


GACGCAAAGC 


TGCAATTGTT 


1260 


CTATCTATCC CGGCCATATA ACGGGTCATC 


GCATGGCATG GGATATGATG ATGAACTGGT 


1320 


CCCCTACGGC AGCGTTGGTG GTAGCTCAGC 


TGCTCCGGAT 


CCCACAAGCC ATCATGGACA 


1380 


TGATCGCTGG TGCTCACTGG GGAGTCCTGG 


CGGGCATAGC 


GTATTTCTCC ATGGTGGGGA 


1440 


ACTGGGCGAA GGTCCTGGTA GTGCTGCTGC 


TATTTGCCGG 


CGTCGACGCG 


GAAACCCACG 


1500 


TCACCGGGGG AAGTGCCGGC CACACCACGG 


CTGGGCTTGT 


TGGTCTCCTT 


ACACCAGGCG 


1560 


CCAAGCAGAA CATCCAACTG ATCAACACCA ACGGCAGTTG 


GCACATCAAT 


AGCACGGCCT 


1620 



SUBSTITUTE SHEET (RULE 26) 



WO 96/34976 PCI7US96/06070 

-22- 











GCT CTT CTAT 


CGCCACAAAT 


1680 


TCAACTCTTC 


AGGCT GTCCT 


GAGAGG 1 1 w 


C LAULl etc g 


AC GC CTT AC C 


GATTTTGC C C 


1740 




t p rr* at r a «r 


TATfVPAflPft 




1 GACGAACGC 


C C CT ACT GTT 


1800 


fine* ft r* r*r* 
GGCAL. i AL»ww 




I G 1 GGCAl 1 (7 


tT*r*t~T t r*cr*'T\ ft ft 
TGCCCGCAAA 


GAGC GT GT GT 


GGCCCGGTAT 


1860 


ATT GCTTCAC 




GTGGTGGTGG 


GAACGACCGA 


CAGGT C GGGC 


GCGCCTACCT 


1920 


ACA GC T GGGG 


I GCAAAT GAT 


AC GGAT GT CT 


TCGTCCTTAA 


CAACACCAGG 


CCACCGCTGG 


1980 


GCAATTGGTT 


CGGTTGTACC 


TGGATGAACT 


CAACT GGATT 


CACCAAAGTG 


TGCGGAGCGC 


2040 


CCCCTTGTGT 


CATCGGAGGG 


GxGGGCAACA 


ACACCTTGCT 


CTGCCCCACT 


GATTGCTTCC 


2100 


GCAAACATCC 


GGAAGCCACA 


TACTCTCGGT 


GCGGCTCCGG 


TCCCT GGATT 


ACACC CAGGT 


2160 


GCATGGTCGA 


CTACC C GTAT 


AGGCTTT GGC 


ACT AT CCTT G 


TACTATCAAT 


TACACCATAT 


2220 


TCAAAuT CAu 


GAT QTAUbT G 


GGAGGGGTCG 


ft ^^ft ^ ft f*r*r**w% 

AGCACAGGCT 


GGAAGCGGCC 


TGCAACTGGA 


2280 


CGCGGGGCGA 


ACGCTGT GAT 


CTGGAAGACA 


GGGACAGGT C 


CGAGCTCAGC 


CCATTGCTGC 


2340 


TGTCCACCAC 


ACAGTGGCAG 


GTCCTTCCGT 


GTTCTTTCAC 


GACCCTGCCA 


GCCTTGTCCA 


2400 






CAGAACATT G 


T GGAC GT GCA 


GT ACT T GT AC 


GGGGTGGGGT 


2460 


r*ft RrpnTpri^ 


GTCCT GGGCC 


ATTAAGTGGG 


AGTACGT C GT 


TCTCCTGTTC 


CTTCTGCTTG 


2520 


pa ffAPfip <tr cs 


t G 1L1 V)C X wt— 


TGCTTGTGGA 


T GATGTTACT 


CAT AT C C CAA 


GC GGAGGCGG 


2580 


CPTT f^ZZ A /*?T\ ft 
Ulll VJiJtrilsHM 


P f**F P PT ft ft T ft 


p ft ft 1* ft 


W\l ILL I bbU 


C GGGACGCAC 


GGT CTT GTGT 


2640 


CCTTCCTCGT 


GTTCTTCTGC 


ill Gl*G X GG 1 


n *p r^p r* ft a fif^r* 
Al l> I wviuw 


lAwl wuT G 


CCCGGAGCGG 


2700 


. TCTACGCCTT 


CTACGGGATG 


TGGCCTCTCC 




GCTGGCGTTG 


C CT CAGCGGG 


2760 


CATACGCACT 


GGACACGGAG 




Wtfl Ul www 


C GTT GTT CTT 


GT C GGGTTAA 


2820 


TGGCGCTGAC 


TCTGTCACCA 


»p ft »p*p ft p ft ft izr* 


u^liilnl 




TGGTGGCTTC 


2880 


AGTATTTT CT 


GACCAGAGTA 


GAAGCGCAAC 






w X WtfttU 1 w^m> 


•9 a a n 


GGGGGGGGCG 


CGATGCCGTC 


AT PTT A f*TPA 






w X wVl/\X llu 


0 Ann 


ACATCACCAA ACTACTCCTG 


GCCATCTTCG 


GACCCCTTTG 


GATTCTTCAA 


GCCAGTTTCr 


JUOU 


TTAAAGTCCC 


CTACTTCGTG 


CGCGTTCAAG 


GCCTTCTCCG 


GATCTGCGCG 


CTAGCGCGGA 


3120 


AGATAGCCGG AGGTCATTAC 


GTGCAAATGG 


CCATCATCAA 


GTTGGGGGCG 


CTT ACT GGCA 


3180 


CCTATGTGTA 


TAACCATCTC 


ACCCCTCTTC 


GAGACTGGGC 


GCACAACGGC 


CTGCGAGATC 


3240 


TGGCCGTGGC 


TGTGGAACCA 


GTCGTCTTCT 


CCCGAATGGA 


GACCAAGCTC 


ATCACGTGGG 


3300 


GGGCAGATAC 


CGCCGCGTGC 


GGTGACATCA 


TCAACGGCTT 


GCCCGTCTCT 


GCCCGTAGGG 


3360 


GCCAGGAGAT 


ACTGCTTGGA 


CCAGCCGACG 


GAATGGTCTC 


CAAGGGGTGG 


AGGTTGCTGG 


3420 


CGCCCATCAC 


GGCGTACGCC 


CAGCAGACGA 


GAGGCCTCCT 


AGGGTGTATA 


ATCACCAGCC 


3480 


TGACTGGCCG 


GGACAAAAAC 


CAAGTGGAGG 


GTGAGGTCCA 


GATCGTGTCA 


ACTGCTACCC 


3540 
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AAACCTTCCT 


GGCAACGTGC ATCAATGGGG 


TATGCTGGAC 


TGTCTAC CAC 




JOUU 


CGAGGACCAT 


CGCATCACCC AAGGGTCCTG 


TCATCCAGAT 


GTAT AC CAAT 




•3££A 
JODU 


ACCTTGTGGG 


CTGGCCCGCT CCTCAAGGTT 


CCCGCTCATT 


GACACCCTGC 






CCTCGGACCT 


TTACCTGGTT ACGAGGCACG 


CCGACGTCAT 


TCCCGTGCGC 






ATAGCAGGGG 


TAGCCTGCTT TCGCCCCGGC 


CCATTTCCTA 


CCTAAAAGGC 






GTCCGCTGTT 


GTGCCCCGCG GGACACGCCG 


TGGGCCTATT 


CAGGGC C GC G 






GTGGAGTGAC 


CAAGGCGGTG GACTTTATCC 


CTGTGGAGAA 






39oQ 


CCCCGGTGTT 


CACGGACAAC TCCTCTCCAC 


CAGCAGTGCC 


c cag zx crrr c 




4020 


ACCTGCATGC 


TCCCACCGGC AGTGGTAAGA 


GCACCAAGGT 




TACGCAGCCC 


4080 


AGGGCTACAA 


GGTGTTGGTG CTCAACCCCT 


CTGTTGCTGC 




TTTGGTGCTT 


4140 


ACATGTCCAA 


GGCCCATGGG GTCGATCCTA ATATCAGGAC 




ACAATTACCA 


4200 


CTGGCAGCCC 


CATCACGTAC TCCACCTACG 


GCAAGTTCCT 


TGCCGACGGC 


GGGTGCTCAG 


4260 


GAGGCGCTTA TGACATAATA ATTTGTGACG AGTGCCACTC 


CACGGATGCC 


ACATCCATCT 


4320 


TGGGCATCGG 


CACTGTCCTT GACCAAGCAG 


AGACTGCGGG 


GGCGAGATTG 


GTTGTGCTCG 


4380 


CCACTGCTAC 


CCC7CCGGGC TCCGTCACTG 


TGTCCCATCC 


TAACATCGAG 


GAGGTTGCTC 


4440 


TGTCCACCAC 


CGGAGAGATC CCTTTCTACG 


GCAAGGCTAT 


CCCCCTCGAG 


oluAruAAbij 


4500 


GGGGAAGACA TCTCATCTTC TGTCACTCAA AGAAGAAGTG 


CGACGAGCTC 




A c £ n 


TGGTCGCATT 


GGGCATCAAT GCCGTGGCCT 


ACTACCGCGG ACTTGACGTG 






CGACCAACGG 


CGATGTTGTC GTCGTGTCGA 


CCGATGCTCT 


CATGACTGGC 


TTT AC CGGCG 




ACTTCGACTC 


TGTGATAGAC TGCAACACGT 


GTGTCACTCA 


GACAGTCGAT 


TT CAGC CTTG 


4740 


ACCCTACCTT 


TACCATTGAG ACAACCACGC 


TCCCCCAGGA 


TGCTGTCTCC 


AGGACTCAGC 


4800 


GCCGGGGCAG 


GACTGGCAGG GGGAAGCCAG 


GCATCTACAG ATTTGTGGCA 


CCGGGGGAGC 


4860 


GCCCCTCCGG 


CATGTTCGAC TCGTCCGTCC 


TCTGTGAGTG 


CTATGACGCG 


GGCTGTGCTT 


4920 


GGTATGAGCT 


CATGCCCGCC GAGACTACAG 


TTAGGCTACG 


AGCGTACATG 


AACACCCCGG 


4980 


GGCTTCCCGT 


GTGCCAGGAC CATCTTGAAT 


TTTGGGAGGG 


CGTCTTTACG 


GGCCTCACCC 


5040 


ATATAGATGC 


CCACTTTCTA TCCCAGACAA 


AGCAGAGTGG 


GGAGAACTTT 


C CTT AC CT GG 


5100 


TAGCGTACCA AGCCACCGTG TGCGCTAGGG 


CTCAAGCCCC 


TCCCCCATCG 


TGGGACCAGA 


5160 


TGTGGAAGTG 


TTTGATCCGC CTTAAACCCA 


CCCTCCATGG 


GCCAACACCC 


CTGCTATACA 


5220 


GACTGGGCGC 


TGTTCAGAAT GAAGTCACCC 


TGACGCACCC 


AATCACCAAA 


TACATCATGA 


5280 


CATGCATGTC 


GGCCGACCTG GAGGTCGTCA 


CGAGCACCTG 


GGTGCTCGTT 


GGCGGCGTCC 


5340 


TGGCTGCTCT 


GGCCGCGTAT TGCCTGTCAA 


CAGGCTGCGT 


GGTCATAGTG 


GGCAGGATTG 


5400 


TCTTGTCCGG 


GAAGCCGGCA ATTATACCTG ACAGGGAGGT 


TCTCTACCAG 


GAGTTCGATG 


5460 
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AGATGGAAGA 


GTGCTCTCAG 


CACTTACCGT 


ACATCGAGCA AGGGAT GAT G 


CTCGCTGAGC 


5520 


AGTTCAAGCA 




GGCCTCCTGC 


AGACCGCGTC 


CCGCCATGCA 


GAGGTT AT CA 


5580 




f"*C & fl & f*C Tift f* 


TGGCAGAAAC 


TCGAGGTCTT 


CTGGGCGAAG 


CACATGT GGA 


5640 


ATT TCATCAG 




TATTTGGCGG 


GCCTGTCAAC 


GCTGCCTGGT 


AACCCCGCCA 


5700 


TTGCTTCATT 


GAT GGCTTTT 


ACAGCT GC CG 


TCACCAGCCC 


ACTAACCACT 


GGCCAAACCC 


5760 


TCCTCTTCAA 


CAT ATT GGGG 


GGGTGGGTGG 


CTGCCCAGCT 


CGCCGCCCCC 


GGTGCCGCTA 


5820 


CCGCCTTTGT 


GGGCGCTGGC 


TTAGCTGGCG 


CCGCCATCGG 


CAGCGTTGGA 


CTGGGGAAGG 


5880 


TCCTCGTGGA 


CATTCTTGCA 


GGGTATGGCG 


CGGGCGTGGC 


GGGAGCTCTT 


GTAGCATTCA 


5940 


AGATCAT GAG 


CGGTGAGGTC 


CCCTCCACGG 


AGGACCT GGT 


CAATCT GCT G 


CCCGCCATCC 


6000 


TCTCGCCTGG 


AGCCCTTGTA 


GTCGGTGTGG 


TCTGCGCAGC 


AATACTGCGC 


CGGCACGTTG 


6060 




GGGGGCAGTG 


CAATGGATGA 


ACCGGCTAAT 


AGCCTTCGCC 


TCCCGGGGGA 


6120 


ACCATGTTTC 


CCCCACGCAC 


TACGTGCCGG 


AGAGCGATGC 


AGCCGCCCGC 


GTCACTGCCA 


6180 




CCTCACTGTA ACCCAGCTCC 


TGAGGCGACT 


ACAT CAGTGG 


ATAAGCTCGG 


6240 


AGT GT AC CAC 


TCCATGCTCC 


GGCTCCTGGC 


TAAGGGACAT 


CTGGGACTGG 


ATATGCGAGG 


6300 


1 VsVTl (jAGCGA 


CTTTAAGACC 


TGGCTGAAAG 


CCAAGCTCAT 


GC CACAACTG 


CCTGGGATTC 


6360 


CCTTTGTGTC 


CTGCCAGCGC 


GGGTATAGGG 


GGGTCTGGCG 


AGGAGACGGC 


ATTATGCACA 


6420 


CTCGCTGCCA 


CTGTGGAGCT 


GAGATCACTG 


GACATGTCAA AAACGGGACG 


ATGAGGATCG 


6480 


TCGGTCCTAG 


GACCTGCAGG AACATGTGGA 


GTGGGACGTT 


CCCCATTAAC 


GCCTACACCA 


6540 


CGGGCCCCTG 


TACTCCCCTT 


CCTGCGCCGA 


ACTATAAGTT 


CGCGCTGTGG 


AGGGTGTCTG 


6600 


CAGAGGAATA 


CGTGGAGATA AGGCGGGTGG 


GGGACTTCCA 


CTACGTATCG 


GGTATGACTA 


6660 


CTGACAATCT 


TAAATGCCCG 


TGCCAGATCC 


CATCGCCCGA 


ATTTTTCACA 


GAATT GGACG 


6720 


GGGTGCGCCT ACATAGGTTT 


GCGCCCCCTT 


GCAAGCCCTT 


GCTGCGGGAG 


GAGGTATCAT 


6780 


TCAGAGTAGG ACTCCACGAG 


TACCCGGTGG 


GGTCGCAATT 


ACCTTGCGAG 


GCCGtAACCGG 


6840 


ACGTAGCCGT 


GTTGACGTCC 


ATGCTCACTG 


ATCCCTCCCA 


TATAACAGCA 




6900 


GGAGAAGGTT 


GGCGAGAGGG 


TCACCCCCTT 


CTATGGCCAG 


CTCCTCGGCC 




o9oU 


CCGCTCCATC 


TCTCAAGGCA ACTTGCACCG 


CCAACCATGA 


CTCCCCTGAC 


GCCGAGCTCA 


7020 


TAGAGGCTAA 


CCTCCTGTGG 


AGGCAGGAGA 


TGGGCGGCAA 


CATCACCAGG 


GTTGAGTCAG 


7080 


AGAACAAAGT 


GGTGATTCTG 


GACTCCTTCG 


ATCCGCTTGT 


GGCAGAGGAG 


GATGAGCGGG 


7140 


AGGTCTCCGT 


ACCCGCAGAA ATTCTGCGGA 


AGTCTCGGAG 


ATTCGCCCGG 


GCCCTGCCCG 


7200 


TTTGGGCGCG 


GCCGGACTAC 


AACCCCCCGC 


TAGTAGAGAC 


GTGGAAAAAG 


CCTGACTACG 


7260 


AACCACCTGT 


GGTCCATGGC 


TGCCCGCTAC 


CACCTCCACG 


GTCCCCTCCT 


GTGCCTCCGC 


7320 


CTCGGAAAAA 


GCGTACGGTG 


GTCCTCACCG 


aatcaaccct 


ACCTACTGCC 


TTGGCCGAGC 


7380 
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TTGCCACCAA AAGTTTTGGC AGCTCCTCAA 


CTTCCGGCAT 




HAT AT GACAA 


7440 


CATCCTCTGA 


GCCCGCCCCT TCTGGCTGCC 


CCCCCGACTC 


C GAC GTT RA fZ 


T C CTATT CTT 


•7 C ft A 

7500 


CCATGCCCCC 


CCTGGAGGGG GAGCCTGGGG ATCCGGATTT 






T C £ A 

7360 


CGGTCAGTAG 


TGGGGCCGAC ACGGAAGATG 


TCGTGTGCTG 


CT CAATGT CT 


T TA T B. f^fi flf* TA 


"T C> ft 


CAGGCGCACT 


CGTCACCCCG TGCGCTGCGG 


AAGAACAAAA 


ACT GC C C Z1T C 


nHL v>tACT GA 


T £ O ft 

7680 


GCAACTCGTT 


GCTACGCCAT CACAATCTGG 


TATATTCCAC 




AuTGCTTGCC 


7740 


AAAGGCAGAA 


GAAAGTCACA TTTGACAGAC 


TGCAAGTTCT 




TACCAGGACG 


7800 


TGCTCAAGGA GGTCAAAGCA GCGGCGTCAA AAGTGAAGGC 


T AACTT GCTA 


T C C GT AGAGG 


7660 


AAGCTTGCAG 


CCTGACGCCC CCACATTCAG 


CCAAATCCAA 


GTTT GGCTAT 


GGGGCAAAAG 


7920 


ACGTCCGTTG 


CCATGCCAGA AAGGCCGTAG 


CCCACATCAA 


CTCCGTGTGG 


AAAGACCTTC 


7980 


TGGAAGACAG 


TGTAACACCA ATAGACACTA TCATCATGGC 


CAAGAACGAG 


GTCTTCTGCG 


8040 


TTCAGCCTGA 


GAAGGGGGGT CGTAAGCCAG 


CTCGTCTCAT 


CGTGTTCCCC 


GACCTGGGCG 


8100 


TGCGCGTGTG 


CGAGAAGATG. GCCCTGTACG 


ACGTGGTTAG 


CAAACTCCCC 


CTGGCCGTGA 


8160 


TGGGAAGCTC 


CTACGGATTC CAATACTCAC 


CAGGACAGCG 


GGTTGAATTC 


CTCGTGCAAG 


8220 


CGTGGAAGTC 


CAAGAAGACC CCGATGGGGT 


TCCCGTATGA 


TACCCGCTGT 


TTTGACTCCA 


8280 


CAGTCACTGA 


GAGCGACATC CGTACGGAGG 


AGGCAATTTA 


CCAATGTTGT 


GACCTGGACC 


8340 


CCCAAGCCCG 


CGTGGCCATC AAGTCCCTCA 


CTGAGAGGCT 


TTATGTTGGG 


GGCCCTCTTA 


8400 


CCAATTCAAG 


GGGGGAAAAC TGCGGCTATC 


GCAGGTGCCG 


CGCGAGCGGC 


GT ACT GACAA 


8460 


CTAGCTGTGG TAACACCCTC ACTTGCTACA 


TCAAGGCCCG 


GGCAGCCCGT 


CGAGCCGCAG 


8520 


GGCTCCAGGA 


CTGCACCATG CTCGTGTGTG 


GCGACGACTT 


AGTCGTTATC 


TGTGAAAGTG 


8560 


CGGGGGTCCA 


GGAGGACGCG GCGAGCCTGA 


GAGCCTTTAC 


GGAGGCTATG ACCAGGTACT 


8640 


CCGCCCCCCC 


CGGGGACCCC CCACAACCAG AATACGACTT 


GGAGCTTATA ACATCATGCT 


0"Tft ft 

o70Q 


CCTCCAACGT 


GTCAGTCGCC CACGACGGCG 


CTGGAAAAAG 


GGTCTACTAC 


CTTACCCGTG 


0 / DO 


ACCCTACAAC 


CCCCCTCGCG AGAGCCGCGT 


GGGAGACAGC 


AAGACACACT 


CCAGTCAATT 


ft b*? n 


CCTGGCTAGG 


CAACATAATC ATGTTTGCCC 


CCACACTGTG 


GGCGAGGATG 


ATACTGATGA 


8880 


CCCATTTCTT 


TAGCGTCCTC ATAGC CAGGG ATCAGCTTGA ACAGGCTCTT AACTGTGAGA 


8940 


TCTACGCAGC 


CTGCTACTCC AT AGAAC CAC 


TGGATCTACC 


TCCAATCATT 


CAAAGACTCC 


9000 


ATGGCCTCAG 


CGCATTTTTA CTCCACAGTT 


ACTCTCCAGG 


TGAAGTCAAT 


AGGGTGGCCG 


9060 


CATGCCTCAG AAAACTTGGG GTCCCGCCCT 


TGCGAGCTTG 


GAGACACCGG 


GCCCGGAGCG 


9120 


TCCGCGCTAG 


GCTTCTGTCC AGGGGAGGCA 


GGGCTGCCAT 


ATGTGGCAAG 


TACCTCTTCA 


9180 


ACTGGGCAGT AAGAACAAAG CTCAAACTCA 


CTCCAATAGC 


GGCCGCTGGC 


CGGCTGGACT 


9240 


TGTCCGGTTG 


GTTCACGGCT GGCTACAGCG 


GGGGAGACAT 


TTATCACAGC 


GTGTCTCATG 


9300 
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CCCGGCCCCG CTGGTTCTGG TTTTGCCTAC TCCTGCTCGC TGCAGGGGTA GGCATCTACC 9360 
TCCTCCCCAA CCGGTGAACG GGGAGCTAGA CACTCCGGCC T 9401 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

GGACTAGTCT GCAGTCTAGA GCTCCATGGC GCCCATCACG GCGTACG 47 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 44 base pairs 
<B) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc * "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CGGCTGCACC TCCAGCAGTG CATTTTAGAT CTTAAGTCTA GAAG 44 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc « "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
AAGCTACTCT ACCTTCTCAC GATTTTAGAT CTTAAGTCTA GAAG 44 
(2) INFORMATION FOR SEQ ID NO: 5: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GGATCTCTGT TGGTACTCTA GGATTTTAGA TCTTAAGTCT AGAAG 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 64 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc =» "OLIGONUCLEOTIDE DUPLEX" 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(v) FRAGMENT TYPE : internal 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1..4 

(D) OTHER INFORMATION: /product** "SINGLE STRANDED REGION 
ON CODING STRAND" 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 61.. 64 

(D) OTHER INFORMATION: /product^ "SINGLE STRANDED REGION 
ON COMPLEMENTARY STRAND" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

TGCACGGCGC CGACACGGAA GATGTCGTGT GCTGCTCAAT GTCTTATACC TGGACAGGCG 

TGCA 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: NO 
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(v) FRAGMENT TYPE: internal 
<Xi> SEQUENCE DESCRIPTION: SEQ ID NO:7: 

Gly Ala Asp Thr Glu Asp Val Val Cys Cys Ser Met Ser Tyr Thr Trp 
1 5 10 15 

Thr Gly val His 
20 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 
tiv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GGACTAGTCT GCAGTCTAGA GCTCCATGGC GCCCATCACG GCGTACG 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CGGGAGCCGG AGGACGTCTG GCGCAGG 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1497 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 87.. 893 

(XX) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 426.-427 

(D) OTHER INFORMATION: /label- ApaLIsite 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

ACCAACCTCT TCGAGGCACA AGGCACAACA GGCTGCTCTG GGATTCTCTT CAGCCAATCT 60 

TCATTGCTCA AGTGTCTGAA GCAGCC ATG GCA GAA GTA CCT GAG CTC GCC AGT 113 

Met Ala Glu Val Pro Glu Leu Ala Ser 
1 5 

GAA ATG ATG GCT TAT TAC AGT GGC AAT GAG GAT GAC TTG TTC TTT GAA 161 
Glu Met Met Ala Tyr Tyr Ser Gly Asn Glu Asp Asp Leu Phe Phe Glu 
10 15 20 25 

GCT GAT GGC CCT AAA CAG ATG AAG TGC TCC TTC CAG GAC CTG GAC CTC 209 
Ala Asp Gly Pro Lys Gin Met Lys Cys Ser Phe Gin Asp Leu Asp Leu 
30 35 40 

TGC CCT CTG GAT GGC GGC ATC CAG CTA CGA ATC TCC GAC CAC CAC TAC 257 
Cys Pro Leu Asp Gly Gly He Gin Leu Arg He Ser Asp His His Tyr 
45 50 55 

AGC AAG GGC TTC AGG CAG GCC GCG TCA GTT GTT GTG GCC ATG GAC AAG 305 
Ser Lys Gly Phe Arg Gin Ala Ala Ser Val Val Val Ala Met Asp Lys 
60 65 70 

CTG AGG AAG ATG CTG GTT CCC TGC CCA CAG ACC TTC CAG GAG AAT GAC 353 
Leu Arg Lys Met Leu Val Pro Cys Pro Gin Thr Phe Gin Glu Asn Asp 
75 80 85 

CTG AGC ACC TTC TTT CCC TTC ATC TTT GAA GAA GAA CCT ATC TTC TTC 401 
Leu Ser Thr Phe Phe Pro Phe He Phe Glu Glu Glu Pro He Phe Phe 
90 95 100 105 

GAC ACA TGG GAT AAC GAG GCT TAT GTG CAC GAT GCA CCT GTA CGA TCA 449 
Asp Thr Trp Asp Asn Glu Ala Tyr Val His Asp Ala Pro Val Arg Ser 
110 115 120 

CTG AAC TGC ACG CTC CGG GAC TCA CAG CAA AAA AGC TTG GTG ATG TCT 497 
Leu Asn Cys Thr Leu Arg Asp Ser Gin Gin Lys Ser Leu Val Met Ser 
125 130 135 

GGT CCA TAT GAA CTG AAA GCT CTC CAC CTC CAG GGA CAG GAT ATG GAG 545 
Gly Pro Tyr Glu Leu Lys Ala Leu His Leu Gin Gly Gin Asp Met Glu 
140 145 150 

CAA CAA GTG GTG TTC TCC ATG TCC TTT GTA CAA GGA GAA GAA AGT AAT 593 
Gin Gin Val Val Phe Ser Met Ser Phe Val Gin Gly Glu Glu Ser Asn 
155 160 165 

GAC AAA ATA CCT GTG GCC TTG GGC CTC AAG GAA AAG AAT CTG TAC CTG 641 
Asp Lys He Pro Val Ala Leu Gly Leu Lys Glu Lys Asn Leu Tyr Leu 
170 175 180 185 

TCC TGC GTG TTG AAA GAT GAT AAG CCC ACT CTA CAG CTG GAG AGT GTA 689 
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Ser Cys Val Leu Lys Asp Asp Lys Pro Thr Leu Gin Leu Glu Ser Val 

190 195 200 

GAT CCC AAA AAT TAC CCA AAG AAG AAG ATG GAA AAG CGA TTT GTC TTC 737 
Asp Pro Lys Asn Tyr Pro Lys Lys Lys Met Glu Lys Arg Phe Val Phe 
205 210 215 

AAC AAG ATA GAA ATC AAT AAC AAG CTG GAA TTT GAG TCT GCC CAG TTC 785 
Asn Lys lie Glu lie Asn Asn Lys Leu Glu Phe Glu Ser Ala Gin Phe 
220 225 230 

CCC AAC TGG TAC ATC AGC ACC TCT CAA GCA GAA AAC ATG CCC GTC TTC 833 
Pro Asn Trp Tyr lie Ser Thr Ser Gin Ala Glu Asn Met Pro Val Phe 
235 240 245 

CTG GGA GGG ACC AAA GGC GGC CAG GAT ATA ACT GAC TTC ACC ATG CAA 881 
Leu Gly Gly Thr Lys Gly Gly Gin Asp lie Thr Asp Phe Thr Met Gin 
250 255 260 265 

TTT GTG TCT TCC TAAAGAGAGC TGTACCCAGA GAGTCCTGTG CTGAATGTGG 933 
Phe Val Ser Ser 

ACTCAATCCC TAGGGCTGGC AGAAAGGGAA CAGAAAGGTT TTTGAGTACG GCTATAGCCT 993 

GGACTTTCCT GTTGTCTACA CCAATGCCCA ACTGCCTGCC TTAGGGTAGT GCTAAGAGGA 1053 

TCTCCTGTCC ATCAGCCAGG ACAGTCAGCT CTCTCCTTTC AGGGCCAATC CCCAGCCCTT 1113 

TTGTTGAGCC AGGCCTCTCT CACCTCTCCT ACTCACTTAA AGCCCGCCTG ACAGAAACCA 1173 

CGGCCACATT TGGTTCTAAG AAACCCTCTG TCATTCGCTC CCACATTCTG ATGAGCAACC 1233 

GCTTCCCTAT TTATTTATTT ATTTGTTTGT TTGTTTTATT CATTGGTCTA ATTTATTCAA 1293 

AGGGGGCAAG AAGTAGCAGT GTCTGTAAAA GAGCCTAGTT TTTAATAGCT ATGGAATCAA 1353 

TTCAATTTGG ACTGGTGTGC TCTCTTTAAA TCAAGTCCTT TAATTAAGAC TGAAAATATA 1413 

TAAGCTCAGA TTATTTAAAT GGGAATATTT ATAAATGAGC AAATATCATA CTGTTCAATG 1473 

GTTCTGAAAT AAACTTCTCT GAAG 1497 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 269 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Ala Glu Val Pro Glu Leu Ala Ser Glu Met Met Ala Tyr Tyr Ser 
15 10 15 

Gly Asn Glu Asp Asp Leu Phe Phe Glu Ala Asp Gly Pro Lys Gin Met 
20 25 30 

Lys Cys Ser Phe Gin Asp Leu Asp Leu Cys Pro Leu Asp Gly Gly lie 
35 40 45 
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Gln Leu Arg He Ser Asp His His Tyr Ser Lys Gly Phe Arg Gin Ala 
50 55 60 

Ala Ser Val Val Val Ala Met Asp Lys Leu Arg Lys Met Leu Val Pro 
65 70 75 80 

Cys Pro Gin Thr Phe Gin Glu Asn Asp Leu Ser Thr Phe Phe Pro Phe 



He Phe Glu Glu Glu Pro He Phe Phe Asp Thr Trp Asp Asn Glu Ala 
100 105 HO 

Tyr Val His Asp Ala Pro Val Arg Ser Leu Asn Cys Thr Leu Arg Asp 
115 120 125 

Ser Gin Gin Lys Ser Leu Val Met Ser Gly Pro Tyr Glu Leu Lys Ala 
130 135 140 

Leu His Leu Gin Gly Gin Asp Met Glu Gin Gin Val Val Phe Ser Met 
145 150 155 160 

Ser Phe Val Gin Gly Glu Glu Ser Asn Asp Lys He Pro Val Ala Leu 
165 170 175 

Gly Leu Lys Glu Lys Asn Leu Tyr Leu Ser Cys Val Leu Lys Asp Asp 
180 185 190 

Lys Pro Thr Leu Gin Leu Glu Ser Val Asp Pro Lys Asn Tyr Pro Lys 
195 200 205 

Lys Lys Met Glu Lys Arg Phe Val Phe Asn Lys He Glu He Asn Asn 
210 215 220 

Lys Leu Glu Phe Glu Ser Ala Gin Phe Pro Asn Trp Tyr He Ser Thr 
225 230 235 240 

Ser Gin Ala Glu Asn Met Pro Val Phe Leu Gly Gly Thr Lys Gly Gly 
245 250 255 

Gin Asp He Thr Asp Phe Thr Met Gin Phe Val Ser Ser 
260 265 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: 
CTCGGCCTCC TGCAGGCACC TGTACGATCA CTGAAC 
(2) INFORMATION FOR SEQ ID NO: 13: 



85 



90 



95 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc =* "oligonucleotide primer 1 * 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GTTAAACACA GAAGGATTTT AGATCTTAAG GG 32 
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CLAIMS 

I claim: 

1. A method for assaying exogenous protease 
activity in a host cell comprising the steps of: 

(a) incubating a host cell transformed with a 
first nucleotide sequence encoding an exogenous protease and a 
second nucleotide sequence encoding an artificial polypeptide 
substrate; 

wherein said substrate comprises: 

(i) a cleavage site for said exogenous 

protease; and 

(ii) a polypeptide that is secreted out of 
said cell following cleavage by said exogenous protease; 

under conditions which cause said exogenous protease and said 
artificial substrate to be expressed; 

(b) separating said host cell from its growth 
media under non-lytic conditions; and 

(c) assaying said growth media for the 
presence of said secreted polypeptide. 

2. A method for assaying endogenous protease 
activity in a host cell comprising the steps of: 

(a) incubating a host cell transformed with a 
nucleotide sequence encoding an artificial polypeptide 
substrate; 

wherein said substrate comprises: 

(i) a cleavage site for said endogenous 

protease; and 

(ii) a polypeptide that is secreted out of 
said cell following cleavage by said endogenous protease; 

under conditions which cause said artificial substrate to be 
expressed; 

(b) separating said host cell from its growth 
media under non-lytic conditions; and 

(c) assaying said growth media for the 
presence of said secreted polypeptide. 
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3. A method for identifying a compound as an 
inhibitor of a protease comprising the steps of: 

(a) assaying the activity of a protease in the 
absence of said compound by a method according to claim 1 or 
2; 

(b) assaying the activity of a protease in the 
presence of said compound by a method according to claim 1 or 

2, wherein said compound is added to the host cells during 
said incubation of said host cells; and 

(c) comparing the results of step, (a) with the * 
results of step (b) . 

4. The method according to claim 1 or claim 3, 
insofar as.it depends from claim 1, wherein said first 
nucleotide sequence and said second nucleotide sequence encode 
a single polypeptide. 

5. The method according to claim 4, wherein said 
first and second nucleotide sequences encode NS3-4A-A4B-IL-11S. 

6. The method according to any one of claims 1 to 

3, wherein said first nucleotide sequence encodes a viral 
protease or an enzymatically active fragment thereof. 

7. The method according to claim 6, wherein said 
first nucleotide sequence encodes hepatitis C virus NS3 
protease, an NS3-4A fusion protein or amino acids 1-180 of NS3 
protease. 

8. The method according to any one of claims 1 to 
3, wherein said secreted polypeptide is selected from 
polypeptides comprising mature IL-lfi, mature IL-la, basic 
fibroblast growth factor and endothelial-monocyte activating 
polypeptide II. 

9. The method according to claim 8, wherein said 
secreted polypeptide comprises mature IL-lfl. 
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10. The method according to claim 9, wherein said 
artificial polypeptide substrate is selected from pre-IL-lfl* 
or pre-IL-lfi(CSM) . 

11. A host cell transformed with a nucleotide 
sequence encoding an artificial polypeptide substrate, wherein 
said substrate comprises: 

(a) a cleavage site for said exogenous 

protease; and 

(b) a polypeptide that is secreted out of said 
cell following cleavage by said exogenous protease; 

said host cell being capable of expressing said protease and 
said substrate. 

12. A host cell transformed with a first nucleotide 
sequence encoding an exogenous protease and a second 
nucleotide sequence encoding an artificial polypeptide 
substrate, wherein said substrate comprises: 

(a) a cleavage site for said exogenous 

protease; and 

(b) a polypeptide that is secreted out of said 
cell following cleavage by said exogenous protease; 

said host cell being capable of expressing said protease and 
said substrate. 

13. The host cell according to claim 11 or 12, 
wherein said secreted polypeptide is selected from 
polypeptides comprising mature IL-lfi, mature IL-la, basic 
fibroblast growth factor and endothelial-monocyte activating 
polypeptide II. 



14. The host cell according to claim 13, wherein 
said secreted polypeptide comprises mature IL-lfi. 
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15. The host cell according to claim 14, wherein 
said artificial polypeptide substrate is selected from pre-lL- 
18* or pre-IL-lfl(CSM) . 

16. The host cell according to claim 12, wherein 
said first nucleotide sequence and said second nucleotide 
sequence encode a single polypeptide. 

17. The host cell according to claim 16, wherein 
said first and second nucleotide sequences encode NS3-4A-A4B- 
IL-lfl. 

18. The host. cell according to claim 12, wherein 
said first nucleotide sequence encodes a viral protease or an 
enzymatically active fragment thereof. 

19. The host cell according to claim 18, wherein 
said first nucleotide sequence encodes hepatitis C virus NS3 
protease, an NS3-4A fusion protein or amino acids 1-180 of NS3 
protease. 

20. The host cell according to claim 11 or 12, 
selected from E. coli, Bacillus , other bacteria, yeast and 
other fungi, plant cells, insect cells, mammalian cells. 

21. The host cell according to claim 20, wherein 
said host cell is a mammalian cell. 

22. The host cell according to claim 21, wherein 
said host cell is a COS cell. 



23. A recombinant DNA molecule comprising a DNA 
sequence encoding an artificial substrate selected from pr< 
IL-lfl* and pre-IL-lfi (CSM) . 
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